Snapshot restore of application chains and applications

ABSTRACT

The present invention saves all process state, memory, and dependencies related to a software application to a snapshot image. Interprocess communication (IPC) mechanisms such as shared memory and semaphores must be preserved in the snapshot image as well. IPC mechanisms include any resource that is shared between two process or any communication mechanism or channel that allow two processes to communicate or interoperate is a form of IPC. Between snapshots, memory deltas are flushed to the snapshot image, so that only the modified-pages need be updated. Software modules are included to track usage of resources and their corresponding handles. At snapshot time, state is saved by querying the operating system kernel, the application snapshot/restore framework components, and the process management subsystem that allows applications to retrieve internal process-specific information not available through existing system calls. At restore time, the reverse sequence of steps for the snapshot procedure is followed and state is restored by making requests to the kernel, the application snapshot/restore framework, and the process management subsystem.

REFERENCE TO RELATED APPLICATIONS

The present application claims priority to and incorporates thefollowing applications by reference: DYNAMIC SYMBOLIC LINK RESOLUTION,Prov. No. 60/157,728, filed on Oct. 5, 1999; SNAPSHOT VIRTUALTEMPLATING, Prov. No. 60/157,728, filed on Oct. 5, 1999; SNAPSHOTRESTORE OF APPLICATION CHAINS AND APPLICATIONS, Prov. No. 60/157,833,filed on Oct. 5, 1999; VIRTUAL RESOURCE-ID MAPPING, Prov. No.60/157,727, filed on Oct. 5, 1999; and VIRTUAL PORT MULTIPLEXING, Prov.No. 60/157,834, filed on Oct. 5, 1999.

FIELD OF THE INVENTION

The present invention relates broadly to computer networks.Specifically, the present invention relates to adaptively schedulingapplications on-demand onto computers in a computer network. Morespecifically, the present invention relates to making a snapshot imageof a running application including data and state information, andrestoring a running application from the snapshot image.

BACKGROUND

Global computer networks such as the Internet have allowed electroniccommerce (“e-commerce”) to flourish to a point where a large number ofcustomers purchase goods and services over websites operated by onlinemerchants. Because the Internet provides an effective medium to reachthis large customer base, online merchants who are new to the e-commercemarketplace are often flooded with high customer traffic from the momenttheir websites are rolled out. In order to effectively serve customers,online merchants are charged with the same responsibility asconventional merchants: they must provide quality service to customersin a timely manner. Often, insufficient computing resources are thecause of a processing bottleneck that results in customer frustrationand loss of sales. This phenomena has resulted in the need for a newutility: leasable on-demand computing infrastructure. Previous attemptsat providing computing resources have entailed leasing large blocks ofstorage and processing power. However, for a new online merchant havingno baseline from which to judge customer traffic upon rollout, thisapproach is inefficient. Either too much computing resources are leased,depriving a start up merchant of financial resources that are neededelsewhere in the operation, or not enough resources are leased, and abottleneck occurs.

To make an on-demand computer infrastructure possible, computerapplications must be ported across computer networks to differentprocessing locations. However, this approach is costly in terms ofoverhead for the applications to be moved across the network must besaved, shut down, stored, ported and then restored and re-initializedwith the previously running data. The overhead is prohibitive andnegates any performance improvements realized by transferring theapplication to another computer. Thus, there remains a heartfelt needfor a system and method for effecting a transfer of applications acrosscomputer networks without incurring costly processing overhead.

SUMMARY OF THE INVENTION

The present invention solves the problems described above by saving allprocess state, memory, and dependencies related to a softwareapplication to a snapshot image. Interprocess communication (IPC)mechanisms such as shared memory and semaphores must be preserved in thesnapshot image as well. IPC mechanisms include any resource that isshared between two process or any communication mechanism or channelthat allow two processes to communicate or interoperate is a form ofIPC. Sockets, shared memory, semaphores and pipes are some examples ofIPC mechanisms. Between snapshots, memory deltas are flushed to thesnapshot image, so that only the modified-pages need be updated.Software modules that track usage of resources and their correspondinghandles are included as part of the snapshot/restore framework of thepresent invention. At snapshot time, state is saved by querying theoperating system kernel, the application snapshot/restore frameworkcomponents, and the process management subsystem that allowsapplications to retrieve internal process-specific information notavailable through existing system calls. At restore time, the reversesequence of steps for the snapshot procedure is followed and state isrestored by making requests to the kernel, the applicationsnapshot/restore framework, and the process management subsystem.

These and many other attendant advantages of the present invention willbe understood upon reading the following detailed description inconjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a high level block diagram illustrating the various componentsof a computer network used in connection with the present invention;

FIG. 2 is a high level block diagram illustrating the various componentsof a computer used in connection with the present invention;

FIG. 3 illustrates how application state is tracked using library andoperating system kernel interposition;

FIG. 4 illustrates the capture of an application's run-time state;

FIG. 5 is a flow chart illustrating the logical sequence of stepsexecuted to create a snapshot image of an application instance;

FIG. 6 is a flow chart illustrating the logical sequence of stepsexecuted to restore an application instance from a snapshot image.;

FIG. 7 is an illustration of the format of a snapshot virtual template;

FIG. 8 is a flowchart illustrating the logical sequence of stepsexecuted to create a snapshot virtual template;

FIG. 9 is a flowchart illustrating the logical sequence of stepsexecuted to clone a snapshot virtual template;

FIG. 10 illustrates the registration of an application using virtualresource identifiers;

FIG. 11 illustrates the allocation of a virtual resource;

FIG. 12 illustrates the translation of a virtual resource to a systemresource;

FIG. 13 illustrates the translation of a system resource to a virtualresource;

FIG. 14 is a flowchart illustrating the logical sequence of stepsexecuted to create a virtual translation table; and

FIG. 15 is a flowchart illustrating the logical sequence of stepsexecuted to translate a virtual resource.

DETAILED DESCRIPTION

A. Snapshot Restore

FIG. 1 illustrates in high level block diagram form the overallstructure of the present invention as used in connection with a globalcomputer network 100 such as the Internet. Remote users 102-1 and 102-2can connect through the computer network 100 to a private network ofcomputers 106 protected by firewall 104. Computer network 106 is anetwork comprising computers 150-1, 150-2, through 150-n, where n is thetotal number of computers in network 106. Computers 150 are used to runvarious applications, as well as host web sites for access by remoteusers 102. The present invention is implemented on computer network 106in the form of virtual environments 110-1 and 110-2. While only twovirtual environments are illustrated, it is to be understood that anynumber of virtual environments may be utilized in connection with thepresent invention.

FIG. 2 illustrates in high level block diagram form a computer that maybe utilized in connection with the present invention. Computer 150incorporates a processor 152 utilizing a central processing unit (CPU)and supporting integrated circuitry. Memory 154 may include RAM andNVRAM such as flash memory, to facilitate storage of software modulesexecuted by processor 152, such as application snapshot/restoreframework 200. Also included in computer 150 are keyboard 158, pointingdevice 160, and monitor 162, which allow a user to interact withcomputer 150 during execution of software programs. Mass storage devicessuch as disk drive 164 and CD ROM 166 may also be in computer 150 toprovide storage for computer programs and associated files. Computer 150may communicate with other computers via modem 168 and telephone line170 to allow the computer 150 to be operated remotely, or utilize filesstored at different locations. Other media may also be used in place ofmodem 168 and telephone line 170, such as a direct connection or highspeed data line. The components described above may be operativelyconnected by a communications bus 172.

FIG. 3 shows how application states are tracked via library and kernelinterposition. The application snapshot/restore framework 200 is asoftware module that processes transactions between the operating system206 and the applications 208. Requests for system resources or changesto process state are routed internally and the applicationsnapshot/restore framework 200 tracks these events in anticipation of asnapshot. The application snapshot/restore framework 200 is transparentto running (and snapshotted) applications. From an application'sperspective, the application is always running. An application snapshotmay consist of multiple processes and multiple threads and includesshared resources in use by a process, such as shared memory orsemaphores. A process may be snapshotted & restored more than once. Thecomputer on which a process is restored on must be identicallyconfigured and have an identical environment (hardware, software, andfiles) that matches the environment of the computer where the processwas snapshotted. All processes that are snapshotted together in the formof an application chain share the same application ID (“AID”). As usedherein, an application chain is the logical grouping of a set ofapplications and processes that communicate with each other and shareresources to provide a common function.

The virtual environment 204 is a layer that surrounds application(s) 208and resides between the application and the operating system 206.Resource handles are abstracted to present a consistent view to theapplication although the actual system resource handles may change as anapplication is snapshot/restored more than once. The virtual environmentalso allows multiple applications to compete for the same resourceswhere exclusion would normally prohibit such behavior to allow multiplesnapshots to coexist without reconfiguration. Preload library 214 is anapplication library that interposes upon an application for the expresspurpose of intercepting and handling library called and system calls.Once the library has been preloaded it is attached to the process'address space. Preload library 214 interposes between application 208and operating system 206. It is distinguished from kernel interpositionin that it operates in “user mode” (i.e., non-kemel and non-privilegedmode). Application 208 can make application programming interface (API)calls that modify the state of the application. These calls are madefrom the application 208 to the operating system API interfaces 210 viathe application snapshot restore framework 200 or the preload library214. The preload library can save the state of various resources byintercepting API interface calls and then saves the state at apre-arranged memory location. When the process' memory is saved as partof the snapshot/restore mechanism, this state is saved since it resiesin memory. The state as it is modified is saved to non-volatile storage(i.e. a file on disk). The preload library notify the snapshot/restoreframework through one of its private interface.

FIG. 4 illustrates the capture of an application's run time state. TheOS API interfaces 210 are standard programming interfaces defined byinternational standards organizations such as XOPEN. The open( ) systemcall which allows an application to open a file for reading is anexample of an API interface. The process management system 216 is acomponent of the operating system 206 that allows one process to examineor alter the state of another process. The interfaces that are providedby this component are usually not standardized interfaces (not part of arecognized standard API) and are OS-implementation dependent. However,such interfaces usually allow access to more state than standardized APIinterfaces. The run-time information captured from a process is used bythe snapshot driver 218.

An application needs to be snapshotted if it is idle and is notcurrently in use or there are higher priority requests that require theapplication be scheduled out and preempted in favor of anotherapplication. A snapshot request is initiated by an application schedulerthat determines when an application needs to be restored on-demand andwhen the application is no longer needed (can be snapshotted to free upresources). The application scheduler does this based on web traffic,server load, request response time, and a number of other factors. Anapplication needs to be restored if there is an incoming request (i.e. aweb browser request) and the application required to handle that request(ie a particular web site) is not currently running. Alternatively, anapplication needs to be restored if there is an incoming request (i.e. aweb browser request) and the application required to handle that request(ie a particular web site) is currently overloaded, so another instanceof that application is restored to handle that request.

FIG. 5 illustrates the logical sequence of steps executed by Snapshotdriver 218 to make a snapshot image of a process. Beginning at step 250,an snapshot image of a runnable application is requested. The AID islooked up (decision step 252) in a table in memory 154 containing a listof every AID present on computer 150. If the AID is not found controlreturns at step 254. However, if the AID is found, control continues todecision step 256 where the snapshot/restore framework 200 searches fora process belonging to the application having the matched AID. If aprocess is found, control continues to step 258, where the process issuspended. For a process to be snapshotted, it must be completelysuspended with no activity present and no ambiguous state (i.e., in atransitory state). Since a process may be represented by asynchronousthreads of activity in the operating system that are not formally partof the process state, any activity in the operating system that isexecuting on behalf of the process must be stopped (i.e. disk I/Oactivity). In other words, there may be moments where temporarily aprocess cannot be snapshotted. This is a finite and short period oftime, but it can occur. If the state is consistent and the threads arequiesced (decision step 260), control loops to step 256 and theremaining processes belonging to the application are located andsuspended. However, if a process is located that does not have aconsistent state or a thread is not quiesced, suspended processes areresumed and the snapshot cannot be completed.

Once all related processes are suspended, for each state of eachsuspended process, the state is checked to see if it is virtualized(step 264). A virtualized state is any process state that reflects avirtualized resource. If the state is virtualized, it is retrieved atstep 266 ; otherwise the non-virtualized state is retrieved at step 268State retrieval is performed as described above by the snapshot driver218 querying the application snapshot/restore framework 200, operatingsystem API interfaces 210, and process management subsystem 216. If thestate has changed since the last snapshot (step 270), the new state isrecorded. Control then loops to step 264 and executes through the abovesequence of steps until all states of all processes are checked. Oncecompleted, control proceeds to step 278, the registered global state,such as semaphores, is removed. Registered global state is state that isnot specifically associated with any one process (ie private state).Global state is usually exported (accessible) to all processes and itsstate can be modified (shared) by all processes. Control proceeds tostep 280, where the process is terminated. If there are remainingprocesses (step 282), these are also terminated. This sequence of stepsis concluded to create a snapshot image which is stored as a file andmade available for transmission to another computer within publiccomputer network 100 or private computer network 106.

FIG. 6 illustrates the sequence of steps executed by the restore driver220 to restore a snapshot image. The snapshot image is accessed via ashared storage mechanism and a restore call is made at step 300. Therestore driver 220 looks up the AID for the snapshot image and (decisionstep 302) if not found control returns and the snapshot image cannot berestored. However, if the AID is found, control continues to decisionstep 304 where, if the snapshotted image matching the AID is located,the global/shared state for each process associated with the snapshotare found. Control then continues to step 308, where remaining global orshared state for the processes are recreated. Since global and sharedstate is not associated with a single process and may be referenced bymultiple processes, it is created first. Recreating the state entailscreating a global resource that is functionally identical to theresource at the time of the snapshot. For example if during a snapshot,a semaphore with id 5193 is found with a value of 7, then to recreatethe state at restore time a new semaphore must be created having theexact same ID as before (ie 5193) and it also must have the same state(ie value 7) as before. Then, for each image, a process is created thatinherits the global/shared state restored in step 308, and each createdprocess is isolated to prevent inter-process state changes. When aprocess is being restored, process state is being registered with thekernel, inter-process mechanisms are being restored and reconnected andI/O buffers in the kernel may be being restored. Some of these actionsin one process may have the unintended side effect of disturbing anotherprocess that is also being restored. For example if an I/O buffer thatis in the operating system as a result of a process_(x) performing awrite to a socket connection, then process_(y) could unintentionally bedelivered an asynchronous signal that notifies it of I/O being present(for reading) prior to the process being fully restored. At step 314,for each type of state within the processes, the process-privateresources are recreated to their state at the time the snapshot imagewas taken. If the state is virtualized (decision step 316), the systemstate is bound to a virtual definition. As part of the restore an extrastep must be done to create a virtual mapping. This is done by takingthe system resource that was created in step 314 and binding it to thevirtual definition that was saved during the snapshot in step 266. Thisallows the application to see a consistent view of resources, since itcannot be guaranteed that at restore time the exact same system resourcewill be available. If the state is shared with another process, such asvia a pipe (decision state 320), the shared state is reconnected withthe other process at step 322. If there are more states (decision step324) steps 314 through 322 are repeated. Once steps 314 through 322 havebeen executed for all states, control continues to step 326, where theprocess is placed in synchronized wait. If there are remaining images inthe snapshot image (decision step 328), steps 310 through 326 arerepeated. Once all images have been processed, control continues to step330, where traces and states induced during restore of the process areremoved, and a synchronized resume of all processes occurs at step 332.

Once steps 300 through 332 have executed without error on whatevercomputer the restore driver 220 was executed, the restored applicationcan continue to run without interruption. Thus, the present inventionavoids the overhead and delay of shutting down an application, storingdata to a separate file, moving both the application and data fileelsewhere, and restarting the program.

B. Snapshot Virtual Templating

In another aspect, the present invention provides a system, method, andcomputer program product for creating snapshot virtual applicationtemplates for the purpose of propagating a single application snapshotinto multiple distinct instances. Snapshot virtual templates allowmultiple application instances to use the same fixed resource ID (“RID”)by making the resource ID virtual, privatizing the virtual RID, anddynamically mapping it to a unique system resource ID. A RID is theidentifier assigned to represent a specific system resource and acts asa handle when referencing that system resource. Anonymous resources areresources that are read-only or functionally isolated from otherapplications. Anonymous resources are also shareable resources. Ananonymous resource is a non-fixed resource allocated by the operatingsystem and identified by a per-process handle. These arefunctionally-isolated since the operating system allocates itanonymously and one is as food as another. Examples of this arenon-fixed TCP ports or file descriptors. A resource is said to benetwork-wide unique if there can only be one instance of that resourcewith its corresponding identifier on computer network or subnetwork. Anexample of this is an network IP address (i.e. 10.1.1.1.1). Snapshotvirtual templates allow snapshots to be described in a manner thatseparates shareable data from non-salable data. Data is loosely definedto mean any system resource (memory, files, sockets, handles, etc.).When a snapshot is cloned from a virtual template, the common or shareddata is used exactly as is, whereas the non-salable data is eithercopied-on-write, multiplexed, virtualized, or customized-on-duplication.The present invention greatly reduces the required administrative setupper application instance. Snapshot virtual templating works by notingaccess to modified resources, fixed system IDs/keys and uniqueprocess-related identifies and automatically inserting a level ofabstraction between these resources and the application. The resourcescontained in a snapshot virtual template can be dynamically redirectedat restore time. Access to memory and storage is managed in acopy-on-write fashion. System resource handles are managed in avirtualize-on-allocate fashion or by a multiplex-on-access mechanism.Process-unique resources are managed in a redirect-on-duplicate fashion.Rules may be defined through an application configurator that allowssome degree of control over the creation of non-salable data.

The application configurator is a software component that resides in theapplication domain and communicates configuration information about theapplication on its behalf such as the DSL specifications. Since thiscomponent operates without assistance from the application, it may existin the form of an application library, or may be placed in theapplications environment (via inheritance at execution time), ir it canbe implemented as a server process that proxies application informationto the operating system as necessary.

A resource duplicator is a software component that fields requests fornon-shareable resources and duplicates or virtualizes resources so thatapplications receive their own private copies and can co-existtransparently with multiple instances of the same application forgedfrom the same virtual template. The resource duplicator also processesduplication rules fed by the application configurator or applicationsnapshot/restore framework 200.

As used herein, non-salable data refers to any resource that is modifiedand globally visible to other application instances is non-salable (i.e.files). Process-related identifiers that are system-wide unique are alsonon-shareable since conflicts will arise if two instances use the sameidentifier at the same time (uniqueness is no longer preserved).References to unique resources by fixed handles (i.e. fixed TCP portnumbers or IPC keys) are also not shareable. Memory pages that arespecific to an application instance (i.e. the stack) are another exampleof a non-shareable resource. For illustrative purposes, examples ofnon-salable data include application config files that must be differentper application instance as well as modified application data files ifthe application is not written to run multiple copies simultaneously.Other examples include stack memory segments or heap segments may alsobe non-salable data, shared memory keys that are a fixed value, usage offixed well-known (to the application) TCP port numbers, and processidentifiers (two distinct processes cannot share the same PID).

The snapshot virtual template is constructed automatically by dividing asnapshot process into shareable and non-shareable data. The knowledge ofwhich system resources can be shared is encoded in the snapshot virtualtemplating framework itself. If an application has non-shareableinternal resources (as opposed to system resources), it may not bepossible to construct a virtual-template for that application.

Snapshot virtual templates are node-independent as well asapplication-instance dependent. Snapshot virtual templates cannot becreated for applications that use non-shareable physical devices.Snapshot virtual templates must save references to non-shareableresources in their pre-customized form, rather than their evaluatedform. All access by an application to non-shareable resources must bevia the operating system. Internal or implicit dependencies by theapplication itself cannot be virtually-templated. A snapshot virtualtemplate may be created from an application instance that was originallyforged from a different virtual template.

Snapshot virtual templating is an alternate method of creating anapplication instance. The snapshot restore method described aboverequires creating unique instances of an application to create unique“snapshots” of that application. Virtual templating allows the creationof a generic application instance from which unique instances may bespawned. Every unique instance that is created from the original virtualtemplate starts out as an exact copy (referred to herein as “clone”) buthas been personalized just enough to make it a fully-functioningindependent copy. Differences between copies may be due to the wayresources are named or identified.

FIG. 7 illustrates in block diagram form the contents of a snapshotvirtual template. The main components are resource name size, resourcedescriptor size, resource type, resource name, and resource data.Resource data includes many different types of information, asillustrated.

FIG. 8 describes the sequence of steps executed by the applicationsnapshot/restore framework 200 to create a snapshot virtual template. Asthe application runs, every request for a new operating system resource(file, memory, semaphore, etc.) is checked for an existing rule. Whenthe application is started under the virtual templating framework, a setof rules may be supplied at that time. The rule will state the type ofresource, the type of access (i.e., create, read, write, destroy, etc),and the action to be taken. If a rule is found (decision step 360), therule is saved as part of the process state and recorded with theresource as auxiliary state at step 362. Rules may be added to thetemplate that control the creation of application-instance specificresources. For example, environment variables or pathnames thatincorporate an AID to differentiate and customize a particular resourceamong multiple instances. The following syntax created for illustrationpurposes:

-   -   Define <APPL-ID>as PROPERTY application-ID    -   REDIR PATH “/user/app/config” to “/usr/app/<APPL-ID>/config”    -   SET ENV “HOME”=“/usr/app/<APPL-ID>”

If rules are created, they should also be specified via the applicationconfigurator. If no rule is found, the resource is checked using astandard set of criteria that determine whether the resource needs to beabstracted or virtualized in order to be cloned at step 364. Thecriteria is again checked at steps 366, 370, 372, 378, 380 and 386. Inmost cases, no action is taken. Resources are simply classified intotheir correct types so that when an instance is cloned the correctaction can be taken. If the resource is shared, i.e. shared memory(decision step 366), the resource is marked as shared (step 368) so thatduring the subsequent snapshot all references to the shared object willbe noted. If the resource can be modified (decision step 370), it mustbe isolated from the original during cloning so that the originalremains untouched. If the resource is a large object and has a notion ofan underlying object, such as i.e. mapped memory (decision step 372), itis marked for copy-on-write (step 374). Otherwise, the entire resourcemust be duplicated and marked accordingly (step 376). A resource is saidto be systemwide unique if the identifier used to represent thatresource cannot represent more than one instance of that resource at asingle point in time on the same node or computer. If the resource issystemwide unique (decision step 378), and is exported as an externalinterface, as is the case when another client application that is notrunning on the platform has a dependency on the resource, such as a TCPport number (decision step 380), it isn't feasible to virtualize accessto the resource, so it is marked to be multiplexed (step 382).Multiplexing allows multiple independent connections to transparentlyco-exist over the same transport using only a single identifiableresource. If it isn't externally exported, the resource is marked forvirtualize at step 384. Continuing to decision step 386, if the resourceis network-unique, it is marked for allocation at step 388. Controlproceeds to step 390, where the resource request is processed. Steps 360through 390 are repeated for every resource request occurring duringapplication execution.

FIG. 9 illustrates the sequence of steps executed to perform cloning orreplication of a process from a snapshot virtual template. This sequenceof steps can be performed by a replication program that creates a newsnapshot image from an existing template, or by the restore driver 220.When an application instance is restored from a snapshot that is avirtual template, a new instance is automatically cloned from thetemplate using the rules that were gathered during the creation of thetemplate. For every resource included in the snapshot virtual template,rules for the resource and access type are looked up. Any resource thatrequires special handling as part of the templating effort has the ruledescribed inside the snapshot template as part of the auxiliary stateassociated with the resource. If no rule is found (decision step 400),the resource is recreated using the existing saved information in thesnapshot (step 402). Otherwise, if a resource is marked for duplicate(decision step 404), then a copy of the original resource is made atstep 406. If a resource is marked for copy-on-write (decision step 408),then at step 410 a reference to the original underlying object (in theoriginal template) is kept, and any modifications to the original forcea copy-on-write action so that the modifications are kept in anapplication-instance private location and the two form a composite viewthat is visible to the application instance.

If a resource is marked for virtualization (decision step 412), theoriginal resource is allocated or duplicated in blank form at step 414.At step 416, the resource is mapped dynamically to the new resource atrun-time by binding the system resource to the saved resource in thesnapshot image. If a resource is marked for multiplex (decision step418), the original resource is duplicated and then spliced among otherapplication instances that share it (step 420). If the resource is anetwork unique resource (decision step 422), a unique resource must beallocated (step 424) by communicating with another component of thenetwork, i.e. network map or registry, that assigns a resource to thisinstance. Then this new resource is bound to the fixed resource that wassaved in the virtual template (step 426), in a manner similar tovirtualization.

C. Virtual Resource ID Mapping

The present invention provides virtual mapping of system resourceidentifiers in use by a software application for the purpose of makingthe running state of an application node independent. By adding a layerof indirection between the application and the resource, new systemresources are reallocated and then can be mapped to the application'sexisting resource requirements while it is running, without theapplication detecting a failure or change in resource handles.

This layer of indirection makes the application's system RID transparentto the application. RID's are usually numeric in form, but can also bealphanumeric. RID's are unique to a machine, and can be reused once allclaims to a specific RID have been given up. Some examples of RID'sinclude process D's, shared memory ID's, and semaphore ID's. Only thevirtual RID is visible to the application. Conversely, the virtual RIDis transparent to the OS, and only the system RID is visible to the OS.Every application has a unique identifier that distinguishes it fromevery other running application. There exists a one to one mappingbetween the AID:resource type:virtual RID combination and the nodeID:system RID. Virtual RID's are only required to be unique within theirrespective applications, along with their corresponding system RID's maybe shared among multiple programs and processes that have the same AID.System RID's that have been virtualized are accessed through theirvirtual ID's to ensure consistent states.

AID's are farm-wide unique resources and are allocated atomically by theAID generator. Because in the present invention applications aren'tuniquely bound to specific names, process ID's, machine hostnames orpoints in time, the AID is the sole, definitive reference to a runningapplication and its processes. Typically, an AID is defined in referenceto a logical task that the application is performing or by the logicaluser that is running the application.

Virtual resource mapping comprises several basic steps: applicationregistration, allocation of the RID, and resolution of the RID. Duringregistration of the application, the AID is derived if preallocated orthe application existed previously, or it may be allocated dynamicallyby an AID generator. The AID is then made known for later use.Allocation of a RID happens when an application requests access to asystem resource (new or existing) and the OS returns a handle to aresource in the form of a RID. The virtual resource layer intercepts thesystem returned RID, allocates a virtual counterpart by calling theresource specific resource allocator, establishes mapping between thetwo, and returns a new virtual RID to the application.

Resolution of a RID may occur in two different directions. A RID may bepassed from the application to the OS, in which case the RID is mappedfrom virtual ID to system ID. Conversely, the RID may be passed from theOS to the application, in which case the transition is from system ID tovirtual ID. Requests for translation are passed from the framework tothe virtual RID translation unit and the corresponding mapping isreturned once it has been fetched from the appropriate translationtable. Multiple translation tables may exist if there are multipleresource types.

FIG. 10 illustrates the steps executed to register an application. TheAppShot harness 500 exists to aid in the creation of the appropriateruntime environment for the application prior to launching theapplication. The appshot harness 500 initializes the runtime environmentfor an application by first priming its own environment with theappropriate settings and then launching the application which inheritsall these settings. The appshot harness 500 is provided because theapplication cannot be recompiled or rewritten to initialize its ownenvironment. Some of the settings that are established by the appshotharness 500 include the AID, assigned process ID range, DSLspecifications, application virtual ID's, and snapshot templating rules.The DSL specifications are registered as part of the environment Aprocess is an in-memory instantiation of a software program. Process_(x)is the in-memory image of the AppShot harness 500 and process_(y) is thein-memory image of the application. At step 510, the appshot harness 500registers an AID a_(i), such as “dbserver.” with the applicationsnapshot/restore framework 200 within the OS kernel 206. The applicationsnapshot/restore framework 200 then creates virtual translation tables502 for the AID at step 512. Virtual translation tables 502 are dataunits that contain translation information related to RID's, such asAID's or process ID's, virtual RID's, and system RID's. Separate tablescan be implemented per resource type or a table can be shared if aunique resource type is stored per table entry. A translation unit mapsthe system RID's to the virtual RID's by storing and fetchingtranslation information in the appropriate translation table. Once thevirtual translation tables 502 are created, Process_(x) is linked to theAID a_(i) at step 514. At this point, process_(y) is created when theappshot harness 500 launches the application at step 516. Process_(y)then inherits process_(x)'s link to a_(i) and its tables at step 518.

FIG. 11 shows the allocation of a virtual resource such as a semaphorein accordance with the present invention. At step 520, an applicationrequests that a semaphore resource is allocated for its process_(y). Inresponse (step 522), the application snapshot/restore framework 200looks up the AID in memory 154, and returns AID a_(i.) At step 524, inresponse to a request from the application snapshot/restore framework200, the system semaphore pool returns semaphore s_(i). At step 526, theapplication snapshot/restore framework 200 scans the virtual resourcetranslation table 502 for an available slot and allocates the virtualsemaphore. At step 528, the application snapshot/restore framework 200inserts the translation s_(i)=a_(i):v₃ and the virtual resourcetranslation table 502 now contains the mapping. At step 530, the virtualsemaphore v₃ is returned to the application.

FIG. 12 illustrations translation of a virtual resource to a systemresource in accordance with the present invention. At step 532, theapplication calls the semaphore interface and supplies the virtual RIDv₃ to the application snapshot/restore framework 200. At step 534, theapplication snapshot/restore framework 200 looks up the AID for thecalling application and returns a_(i). At step 536, the applicationsnapshot/restore framework 200 then looks up the translation fora_(i):v₃ in the virtual resource translation table 502, which returnss_(i) at step 538. At step 540, the OS semaphore implementation isachieved when the application snapshot/restore framework 200 forwardsthe application's request by substituting s_(i) for v₃.

FIG. 13 illustrates translation of a system resource to a virtualresource. Beginning at step 542, the application calls the semaphoreinterface and expects the RID as a result. At step 544, the applicationsnapshot/restore framework 200 looks up the AID for the callingapplication and returns a_(i). At step 546, the applicationsnapshot/restore framework 200 forwards the application request to theOS semaphore implementation, which returns the system semaphore s_(i) atstep 548. At step 550, the application snapshot/restore framework 200then looks up the translation for a_(i):s₁ in the virtual resourcetranslation table 502, which returns v₃ at step 552. At step 554, theapplication snapshot/restore framework 200 returns the virtual RID v₃ tothe calling application.

FIG. 14 illustrates the logical sequence of steps executed to create thevirtual translation table 502. Beginning at step 556, and attempt ismade to register AID a_(i). If the AID hasn't already been registered(decision step 558), a virtual resource translation table space fora_(i) is created in table 502 (step 560). Translation tables are thenadded for each type of resource associated with the application at step562. At step 564, the process_(x) is linked to a_(i). At step 566, theprocess_(y) is created and the application is launched. Steps 568 and570 show the parallel paths of execution. The flow of control continueson to step 568 and halts shortly thereafter. The new flow of executioncontinues on from 566 to 570, where the process inherits context fromthe process at step 568.

FIG. 15 illustrates in greater detail the sequence of steps executed totranslate a virtual resource. For illustrative purposes, the resource inthis example is a semaphore. Beginning at decision step 580, if an AIDis found upon lookup, and the interface uses the RID as a parameter(decision step 582), the application snapshot/restore framework 200performs a lookup of the translation for a₁:v₁ at step 584. If a systemresource s₁ is found, the system RID is substituted for the virtual RIDat step 586 and passed to the semaphore interface of the OS 206 (step588). If the semaphore was not allocated by the semaphore interface(decision step 590), and the interface returns a semaphore (decisionstep 592) control proceeds to step 594 where a reverse lookup for thetranslation of the AID with a system RID is performed. The returnedvirtual RID is then substituted for the virtual ID at step 596.Returning to decision step 590, if a semaphore was allocated by thesemaphore interface, control proceeds to step 598 where the virtualsemaphore is allocated and a translation for v₂=a₁:s₂ is inserted intothe translation table 502 at step 600. V₂ is then substituted for s₂ atstep 602.

In another aspect, the present invention provides communication betweenat least two applications through virtual port multiplexing. Thecommunication is achieved by accepting a connection from a secondapplication on a first port and allocating a second port to receive thecommunication from the second application. Once the second port has beenallocated the second port translation is recorded. The communication issent to the first port from the second application and received on thesecond port. The communication is then delivered to a first applicationfrom the second port. In one embodiment the first application requeststhe communication from the first port and the first port is translatedto determine the second port such that the communication is delivered tothe first application in the step of delivering the communication to thefirst application.

In one embodiment, the communication is received on the first portfollowing the step of sending the data to the first port, the first portis translated to determine the second port prior to the step ofreceiving the communication on the second port, and the step ofreceiving the communication on the second port includes queuing thecommunication on the second port from the first port.

In one embodiment, the second application requests to connect with thefirst port prior to the step of accepting the connection. Once thesecond port is allocated, the second port is negotiated includingnegotiating the second port between a first and second virtual portmultiplexer. Further, the second application is connected with thesecond port following the step of allocating the second port. The stepof recording the translation including, first, recording the translationof the second port in association with the first application, andsecond, recording the translation of the second port in association withthe second application.

The present invention also provides for a dynamic symbolic link (DSL)and the resolution of that DSL. The pathname of a first application isrenamed to a target pathname, a variable within the target pathname, thefirst pathname is defined as a symbolic link and the symbolic link isassociated with a virtual pathname. The method and apparatus furtherdefines a specification is further defined that is associated with thevirtual pathname including associating the variable with the virtualpathname. In associating the symbolic link with the virtual pathname, adeclaration is defined within the virtual pathname.

Having disclosed exemplary embodiments and the best mode, modificationsand variations may be made to the disclosed embodiments while remainingwithin the scope of the present invention as defined by the followingclaims.

1-6. (canceled)
 7. A method comprising: determining that a snapshot ofan application is to be made during execution of the application on acomputer system; and generating a snapshot image including anapplication state corresponding to the application, wherein theapplication state includes a state of one or more interprocesscommunication (IPC) mechanisms in use by the application.
 8. The methodas recited in claim 7 wherein the IPC mechanisms comprise a sharedmemory.
 9. The method as recited in claim 7 wherein the IPC mechanismscomprise a semaphore.
 10. The method as recited in claim 7 wherein theIPC mechanisms comprise a pipe.
 11. The method as recited in claim 7wherein the IPC mechanisms comprise a socket.
 12. The method as recitedin claim 7 wherein the application state further comprises a pluralityof memory pages used by the application.
 13. The method as recited inclaim 12 further comprising: determining that a second snapshot of theapplication is to be made during execution of the application; andgenerating a second snapshot image including the application state. 14.The method as recited in claim 13 wherein the plurality of memory pagesin the second snapshot image are pages that have been modified by theapplication between generating the snapshot image and generating thesecond snapshot image.
 15. The method as recited in claim 7 furthercomprising: determining that the snapshot image is to be restored;restoring the snapshot image; and continuing execution of theapplication from the application state restored from the snapshot image.16. The method as recited in claim 15 wherein restoring the snapshotimage and continuing execution of the application are performed on adifferent computer system from the computer system on which the snapshotimage was generated.
 17. The method as recited in claim 7 furthercomprising: during execution of the application, intercepting one ormore events generated by the application for service by an operatingsystem executing on the computer system, wherein the one or more eventscause changes in the application state; and recording the interceptedevents.
 18. The method as recited in claim 17 wherein at least a portionof the intercepting is performed by a software module that executes in aprivileged mode, wherein the operating system also executes in theprivileged mode.
 19. The method as recited in claim 17 wherein at leasta portion of the intercepting is performed by a software module thatexecutes in a user mode, wherein the application also executes in theuser mode.
 20. The method as recited in claim 17 wherein generating thesnapshot image comprises querying one or more software modules thatperform the intercepting to capture the changes in the application statecaptured from the intercepted events.
 21. The method as recited in claim20 further comprising querying one or more operating system applicationprogramming interfaces (APIs) to capture a portion of the applicationstate.
 22. The method as recited in claim 20 further comprising queryinga process management subsystem to capture a portion of the applicationstate.
 23. A computer accessible medium storing a plurality ofinstructions comprising instructions which, when executed, implement amethod comprising: generating a snapshot image including an applicationstate corresponding to the application, wherein, if the application usesone or more interprocess communication (IPC) mechanisms, the applicationstate includes a state of the one or more IPC mechanisms.
 24. Thecomputer accessible medium as recited in claim 23 wherein the IPCmechanisms comprise a shared memory.
 25. The computer accessible mediumas recited in claim 23 wherein the IPC mechanisms comprise a semaphore.26. The computer accessible medium as recited in claim 23 wherein theIPC mechanisms comprise a pipe.
 27. The computer accessible medium asrecited in claim 23 wherein the IPC mechanisms comprise a socket. 28.The computer accessible medium as recited in claim 23 wherein theapplication state further comprises a plurality of memory pages used bythe application.
 29. The computer accessible medium as recited in claim28 wherein the method further comprises generating a second snapshotimage including the application state.
 30. The computer accessiblemedium as recited in claim 29 wherein the plurality of memory pages inthe second snapshot image are pages that have been modified by theapplication between generating the snapshot image and generating thesecond snapshot image.
 31. The computer accessible medium as recited inclaim 23 wherein the method further comprises restoring the snapshotimage, wherein execution of the application is continued from theapplication state restored from the snapshot image.
 32. The computeraccessible medium as recited in claim 31 wherein restoring the snapshotimage and continuing execution of the application are performed on adifferent computer system from the computer system on which the snapshotimage was generated.
 33. The computer accessible medium as recited inclaim 23 wherein the method further comprises: during execution of theapplication, intercepting one or more events generated by theapplication for service by an operating system executing on the computersystem, wherein the one or more events cause changes in the applicationstate; and recording the intercepted events.
 34. The computer accessiblemedium as recited in claim 33 wherein at least a portion of theintercepting is performed by instructions that execute in a privilegedmode, wherein the operating system also executes in the privileged mode.35. The computer accessible medium as recited in claim 33 wherein atleast a portion of the intercepting is performed by instructions thatexecute in a user mode, wherein the application also executes in theuser mode.
 36. The computer accessible medium as recited in claim 33wherein generating the snapshot image comprises querying one or moresoftware modules that perform the intercepting to capture the changes inthe application state captured from the intercepted events.
 37. Thecomputer accessible medium as recited in claim 36 wherein the methodfurther comprises querying one or more operating system applicationprogramming interfaces (APIs) to capture a portion of the applicationstate.
 38. The computer accessible medium as recited in claim 36 whereinthe method further comprises querying a process management subsystem tocapture a portion of the application state.
 39. A computer systemcomprising: an application; and a snapshot driver module configured,during execution of the application on the computer system, to generatea snapshot image including an application state corresponding to theapplication, and wherein, if the application uses one or moreinterprocess communication (IPC) mechanisms, the application stateincludes a state of the one or more IPC mechanisms.
 40. The computersystem as recited in claim 39 wherein the IPC mechanisms comprise ashared memory.
 41. The computer system as recited in claim 39 whereinthe IPC mechanisms comprise a semaphore.
 42. The computer system asrecited in claim 39 wherein the IPC mechanisms comprise a pipe.
 43. Thecomputer system as recited in claim 39 wherein the IPC mechanismscomprise a socket.
 44. The computer system as recited in claim 39wherein the application state further comprises a plurality of memorypages used by the application.
 45. The computer system as recited inclaim 44 wherein the snapshot driver module is further configured togenerate a second snapshot image including the application state. 46.The computer system as recited in claim 45 wherein the plurality ofmemory pages in the second snapshot image are pages that have beenmodified by the application between generating the snapshot image andgenerating the second snapshot image.
 47. The computer system as recitedin claim 39 further comprising a restore driver module configured torestore the snapshot image, and wherein the application is configured tocontinue execution from the application state restored from the snapshotimage.
 48. The computer system as recited in claim 39 further comprisingone or more software modules configured to intercept one or more eventsgenerated by the application during execution of the application, theone or more events generated by the application for service by anoperating system executing on the computer system, wherein the one ormore events cause changes in the application state, and wherein the oneor more software modules are configured to record the interceptedevents.
 49. The computer system as recited in claim 48 wherein at leastone of the one or more software modules executes in a privileged mode,wherein the operating system also executes in the privileged mode. 50.The computer system as recited in claim 48 wherein at least one of theone or more software modules executes in a user mode, wherein theapplication also executes in the user mode.
 51. The computer system asrecited in claim 48 wherein the snapshot driver module is configure toquery the one or more software modules to capture the changes in theapplication state captured from the intercepted events.